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Abstract —Suppose that there is a ground set which consists of a 
large number of vectors in a Hilhert space. Consider the prohlem 
of selecting a subset of the ground set such that the projection of a 
vector of interest onto the subspace spanned by the vectors in the 
chosen subset reaches the maximum norm. This problem is gen¬ 
erally NP-hard, and alternative approximation algorithms such 
as forward regression and orthogonal matching pursuit have been 
proposed as heuristic approaches. In this paper, we investigate 
bounds on the performance of these algorithms by introducing 
the notions of elemental curvatures. More specifically, we derive 
lower bounds, as functions of these elemental curvatures, for 
performance of the aforementioned algorithms with respect to 
that of the optimal solution under uniform and non-uniform 
matroid constraints, respectively. We show that if the elements in 
the ground set are mutually orthogoual, then these algorithms 
are optimal when the matroid is uniform and they achieve at least 
1/2-approximatlons of the optimal solution when the matroid is 
non-uniform. 


I. Introduction 

Consider the Hilbert space if^ii) of square integrable 
random variables with p the probability measure. Let X be 
a ground set of vectors and 77 be the vector of interest in 
L^(/i). Let / be a non-empty collection of subsets of X, 
or equivalently, a subset of the power set 2^. For any set 
E G I, we use span(i5) to denote the subspace spanned by 
the vectors in E. We use VrjiE) to denote the projection of p 
onto span(£'). The goal is to choose an element E in I such 
that the square norm of Vrj{E) is maximized, i.e., 

maximize \\'Pr,{E)\\'^ 
subject to E G I. 

A. Motivating Examples 

The above formulation has vast applications in statistical 
signal processing D] O such as maximizing the quadratic 
covariance bound, sensor selection for minimizing the mean 
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squared error, and sparse approximation for compressive sens¬ 
ing. Here we briefly introduce a few examples. 

1) Quadratic Covariance Bound. 

Let pg be the underlying probability measure associated 
with parameter 0 lying on the parameter space 0. The 
problem of interest is to estimate g{9), where p : 0 —?> M 
is a bounded known function. Let g G Lf{p,g) be an 
unbiased estimator of g(9) and g = g — g(6) G Lf{p,g) 
be the estimation error, which is the vector of interest. 
For any set E of score functions, the variance of any 
unbiased estimator is lower bounded by the square norm 
of the projection of estimation error g onto span(£:). This 
fact is also known as quadratic covariance bound El, 0; 

Variance[ 5 ] = \\gf > \\Vr,iE)f, (2) 

where ||p|p = and denotes the ex¬ 

pectation with respect to the measure pg. The well- 
known Cramer-Rao bounds a, Bhattacharyya bounds 
0, and Barankin bounds Q are essentially special 
cases of the quadratic covariance bound by substituting 
E with specific sets of score functions. For example, 
the score function for Cramer-Rao bounds is simply 
d\nd{x;6)/d6, where d{x;9) denotes the probability 
density function of measurement x. While these estab¬ 
lished bounds provide insightful understandings for the 
performance of unbiased estimators, the corresponding 
score functions do not necessarily provide the tightest 
bounds for the estimator variance. Moreover, derivation 
of these bounds such as Cramer-Rao bounds requires 
the inverse or pseudo-inverse Fisher information ma¬ 
trix, which can be computationally impractical for large 
number/dimension of unknown parameters ®. Last, a 
necessary condition to compute these bounds is that the 
probability density function and its partial derivatives 
are well-defined. For these reasons, other score functions 
might be more suitable for providing the lower bound. 
Suppose that there exists a large set X of candidate score 
functions in Lf{p,g). We aim to choose an optimal subset 
E C X which maximizes ||P,,(i?)|p and hence provides 
the tightest bound for variances of unbiased estimators. 

2) Linear Minimum Mean Squared Error Estimator. 
Suppose that there is a large set of sensors, each of 
which makes a zero-mean and square-integrable random 
sensor observation. These sensor observations are not 
necessarily independent. The goal is to select a subset 
of the sensors such that the mean squared error for 
estimating the parameter of interest g is minimized. It 
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is well-known that the orthogonality principle implies 
that the Linear Minimum Mean Squared Error (LMMSE) 
estimator, denoted by ryuMMSE, is the projection of rj onto 
the subspace spanned by a selected subset E Q. The 
problem of interest is how to choose E from the set X 
of all sensor observations such that the mean squared 
error E[(plmmse — v)'^] is minimized, i.e., the projection 
of T] onto span(i 5 ) is maximized. Another approach to this 
sensor selection problem is to maximize the information 
gain and apply submodularity to bound the performance 
of greedy algorithms 0- HD. When the criterion is 
mean squared error, the objective function is in general 
not submodular, resulting in difficulty to quantify the 
performance of the greedy algorithms. 

3 ) Sparse Approximation for Compressive Sensing. 

Compressive sensing is the problem of recovering a 
sparse signal using linear compressing measurements 
(see, e.g., CD- ED). Let 77 G be the measurement 
signal. We assume that 77 = Hx where H G is 

the measurement matrix. The goal is to find K non¬ 
zero components in the Tr-dimensional vector x with 
K d < n such that Hx can exactly recover or well- 
approximate 77, i.e., 

minimize II77 — Hx|p 
subject to ||x||o < K, 

where ||x||o denotes the Lg-norm of x. The geomet¬ 
rical interpretation of the above problem is to select 
K columns of matrix H such that the norm of the 
projection of 77 onto the subspace spanned by the chosen 
columns is maximized. Adaptive algorithms such as those 
based on partially observable Markov decision processes 
have been proposed to find the optimal solution mi. 
The computation complexity for adaptive algorithms is 
in general quite high despite the reduction brought by 
approximation methods such as rollout. 

All the above applications are special cases of the projection 
maximization problem defined in ([D- In general, problem ([D 
is a combinatorial optimization problem and it is NP-hard to 
obtain the optimal solution. Alternative algorithms such as 
forward regression m and orthogonal matching pursuit ll20l - 
II24I have been studied intensively to approximate the optimal 
solution of CJ. Each of these two algorithms starts with an 
empty set, and then incrementally adds one element to the 
current solution by optimizing a local criterion, while the 
updated solution still belongs to the set of feasible solutions 
I. They are known as greedy approaches due to the nature 
of local optimality, although the local criteria are different 
Details are given in Algorithms [D and m respectively. The 
definition of matroid will be given in Section HI] Moreover, we 
use (r|s) to denote the inner product of r and s in the Hilbert 
space. Notice that neither algorithm achieves the maximum 
projection in general. The main purpose of this paper is to 
quantify their performance with respect to that of the optimal 
solution. We note that another frequently used approach is 

'other variations of greedy approaches have also been proposed and 
investigated (see, e.g., (25] (ai). 


through convex relaxation schemes based on sparse-eigenvalue 
or restricted isometry property li 27 l . although the objective 
there is usually to minimize the difference between the actual 
and estimated coefficients of sparse vectors (this corresponds 
to Lo-nomi minimization while O deals with L2-norm). 


Algorithm 1 : Eorward Regression 
Input : Ground set X and an associated matroid (A,/); 

vector of interest 77. 

Output; An element E G I. 

1 begin 

2 E ^ 0; 

3 for i = 1 to K do 

4 s* = argmax ||P^(£’U {s})|p; 

sGX\E,EU{s}eI 

5 Update E ^ EU {s*}; 

6 end 

7 end 


Algorithm 2 : Orthogonal Matching Pursuit 
Input : Ground set X and an associated matroid (A,/); 

vector of interest 77. 

Output; An element E G L 

1 begin 

2 E ^ 0; 

3 Residue r = r]-, 

4 for i = 1 to K do 

5 s* = argmax Kris)!; 

sGX\E,EU{s}eI 

6 Update E ^ EU {s*}; 

7 Update r r — Vr^^E)-, 

8 end 

9 end 


B. Main Contributions 

The main purpose of this paper is to provide performance 
bounds for forward regression and orthogonal matching pursuit 
with respect to the optimal solution. To derive the bounds, 
we will define several notions of elemental curvatures, which 
are inspired by the elemental curvature introduced in li2^ . 
We also illustrate from a geometric perspective how these 
elemental curvatures are related with principal angles, which 
are in turn related with the restricted isometry property and 
mutual incoherence 1 ^ . It turns out that the (near-)optimality 
of the two aforementioned algorithms is closely related with 
the mutual (near-)orthogonality of the vectors in the ground 
set and the structure of the matroid. Our approach allows 
the derivation of sharp approximation bounds for these two 
algorithms, in general situations (where the matroid might 
be uniform or non-uniform). To the best of our knowledge, 
the non-uniform matroid situation has never been investigated 
in any previous papers. More specifically, in the special case 
where the vectors in the ground set are mutually orthogonal, 
these two algorithms are optimal when the matroid is uniform 
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and they achieve at least 1/2-approximations of the optimal 
solution when the matroid is non-uniform. 

II. Curvatures, Matroid, and Related Work 

In this section, we first introduce two new notions of 
curvature and review the definition the matroid. Then we 
review the related literature to our study. Last, we investigate 
the notions of curvature from a geometric perspective. 

As we shall see later from this geometric perspective, 
curvatures are essentially metrics to capture the mutual near¬ 
orthogonality of the vectors in the ground set. Without loss of 
generality, throughout the paper we assume that all elements in 
X are normalized, i.e., ||t|p = 1 for any t £ X. Let and 

t{E) be the normalized orthogonal and parallel components 
of t with respect to span(i?) (simplified as f-*- and i unless 
otherwise specified): 

t = sin + t cos (p, 

where p denotes the angle between t and span(£'). 

We define the forward elemental curvature, denoted by k, 
as follows: 

^ \\V,{EU{s,t})r-\\V,iEU{s})r 

\\Vr,{Eyj{t)W-\\V,{EW 

subject to E C X, s,t £ X \ E, card{E) < 2 K — 2 , 

and \\P,{{s^m)\\<\\r,{{t^m)l 

Similarly, we define the backward elemental curvature, de¬ 
noted by R as follows: 

\\r,{Eu{s,t})r-\\r,{Eu{s})r 
^ \\r,{Eu{s}W-\\r,{EW 

subject to E C X, card(i?) < 2 K — 2 , s,t £ X \ E, 
and ||P,({s^(L;)})|| > \\V,i{t^m)l 

Notice that both curvatures are ratios of differences of the 
discrete function, analogous to second-order derivative of a 
continuous function. In particular, if all the elements in X are 
mutually orthogonal, then k = R = 1 . Moreover, it is easy to 
show that the objective function in ([T]) is always monotone: 
Suppose that S C T C X. Then, by definition, span(S') is a 
subspace of span(r). Thus we have 

\\rr,is)r < \\r^{T)r, 

which indicates that R and R are always non-negative. 

Next we state the definition of matroid. Let / be a collection 
of subsets of X. We call {X, I) a matroid if it has 
the hereditary property: For any ScTcX,T£l 
implies that S' € /; and the augmentation property: For any 
S, T £ I, if T has a larger cardinality than S, then there 
exists j £ T \ S such that S U {/} G /. Furthermore, we call 
{X,I) a uniform matroid if / = {S C X : card(S) < K} 
for a given K, where card(S) denotes the cardinality of S. 
Otherwise, {X,I) is a non-uniform matroid. The structure of 
a matroid captures the feasible combinatorial solutions within 
the power set of the ground set. Take the sensor selection 
problem as an example, a uniform matroid constraint means 
that we can choose any combination of K sensors from all 


the sensors for the solution; a non-uniform matroid constraint 
means that only certain combinations of K sensors are feasible 
solutions. Similarly, in many compressed sensing applications 
such as ED, we might have some prior knowledge that not 
all combinations of sparsity locations are feasible solutions. 


A. Related Work 


We first review the notion of submodular set function. Let 
X be a ground set and / : 2^ —R be a function defined on 
the power set 2 ^. We call that / is submodular if 

1 ) / is non-decreasing: f{A) < f{B) for all A C B; 

2) /(0) = 0 where 0 denotes the empty set (note that we 

can always substitute / by / — /(0) if /(0) 0); 

3 ) / has the diminishing-return property: For all A C P C 
X and j £ X\B, we have f{AU{j}) — f{A) > f{B U 
{/}) - f{B). 

The optimization problem that aims to find a set in the 
matroid to maximize a submodular function is in general 
not tractable. Many papers have studied the greedy algorithm 
as an alternative: starting with an empty set, incrementally 
add one more element that maximizes the local gain of the 
objective function to the current solution, while the updated 
solution still lies in the matroid. Existing studies have shown 
that the greedy algorithm approximates the optimal solution 
well. More specifically, Nemhauser et al. showed that the 
greedy algorithm achieves at least a (1 — e“^)-approximation 
for a uniform matroid. Fisher et al. proved that the greedy 
algorithm provides at least a 1/2-approximation of the optimal 
solution for a non-uniform matroid. Moreover, let Kt be the 
total curvature of function /, which is defined as 


Kt 


max 


f{X)-fiX\{j}) \ 

/({j})-/(0) i 


Conforti and Cornuejols || 3 ^ showed that the greedy algorithm 
achieves at least ^(1 — and Yij^-approximations of 

the optimal solution for uniform and non-uniform matroids, 
respectively. Note that kj G [ 0 , 1 ] for any submodular function, 
and the greedy algorithm is optimal when Kt = 0 . Vondrak 
m showed that the continuous greedy algorithm achieves at 
least a ^(1 — e“'‘*)-approximation for any matroid. On the 
other hand, Wang et al. ll^ provided approximation bounds 
for the greedy algorithm as a function of elemental curvature, 
which generalizes the notion of diminishing return and is 
defined as 


EcX,t,j€X\E,zPj f{E U {z}) - f{E) 


Note that the objective function is submodular if and only 
if Ke < 1 . When Ke < 1 , the lower bound for greedy 
approximation is greater than (1 — e~^). If Ke > 1 , then 
the objective function is not submodular. In this case, lower 
bound for the greedy algorithm is derived as a function of the 
elemental curvature. In ll^ and llJTl . Zhang et al. generalized 
the notions of total curvature and elemental curvature to string 
submodular functions where the objective function value de¬ 
pends on the order of the elements in the set. This framework 
is further extended to approximate dynamic programming 
problems by Liu et al. in |[ 38 l. 
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We use |i) to denote the orthonormal bases of the Hilbert 
space, i = 0,1,.... The objective function in O is not 
submodular in general. For example, let rj = |0), s = |1), 
and f 5 |0) + -^ |1). Then we have 

lin(W)lP-r,(0)f = i; 

\\'Pr,{{s,t})r-\\r,{{s})r = i>\. 

Evidently the diminishing return property does not hold in this 
case. In fact, the diminishing return property does not always 
hold even if all the elements in the ground set are mutually 
orthogonal. Therefore, the results from classical submodularity 
theory (e.g., EH M) are not directly applicable to our 
problem. To address this issue, several notions of approx¬ 
imation submodularity are introduced to bound the greedy 
algorithm performance. Cevher and Krause showed that 
the greedy algorithm achieves a good approximation for sparse 
approximation problems using the approach of approximation 
submodularity. Das and Kempe HOl improved the approxima¬ 
tion bound by introducing the notion of submodularity ratio. 
These are powerful results, but with limited extension to non- 
uniform matroid structures. In this paper, we will use the 
aforementioned notions of curvature to bound the performance 
of forward regression and orthogonal matching pursuit with 
respect to the optimal solution even if the matroid is non- 
uniform. 


B. Geometric Interpretation of Curvatures 

To understand the curvatures from a geometric perspective, 
we define the principal angle as follows: 

6= min arccos ||Ps(£^)||, 

E<ZX,\E\<2K-2,s^X\E 

where f G [0,7r/2]. Geometrically speaking, this is saying 
that the angle between the subspace spanned by any subset 
E (with cardinality less than or equal to 2K — 2) and any 
element in the set X \ E is not smaller than f. Note that if 
all the elements in X are mutually orthogonal, then f = tt/2. 

We now investigate the relationship between the princi¬ 
pal angle and two widely used conditions in compressed 
sensing to quantify the performance of recovery algorithms, 
namely restricted isometry and mutual incoherence. Let 
H = [hi,h 2 , ■ ■ ■ ,hm] be the matrix associated with E = 
{hi, /i 2 , • ■ • j hm}- It is easy to see that 

cos(j)= max ||Ps(i?)|| 

EcX,\E\<2K-2,seX\E 

= max ||H(H^H)^iH'^s|| 

EcX,\E\<2K-2,seX\E 

< max ||H(H^H)~i||H^s||. 

E(ZX,\E\<2K-2,s£X\E 

The last inequality is by the Cauchy-Schwarz inequality. 


Moreover, we have 


|H(H^H)-i|| = sup ||H(H^H)-ia;|| 
lkll=l 


and 


= x/A^ax((H(H^H)-i)^H(H^H)- 


= x/A^ax(H^H) 


-1 


= ( VA„,i„(H^’H)j 




Thus, we have 

cosf < 


(3) 


max 

EcX,\E\<2K-2,s^X\E 






Here Ainin(H^H) denotes the minimum eigenvalue of the 
correlation matrix H^H, which is closely related with the 
restricted isometry property. The summation term for the inner 
products is upper bounded by m times the squared mutual 
incoherence. 

Next we present a result that bridges curvatures and princi¬ 
pal angle. 

Theorem 1: Forward and backward elemental curvatures 
are both upper bounded as: 


max(/c, k) < 


1 

1 — 2 cos (j)' 


The proof is given in Appendix This result is important 
in the cases where the curvatures are difficult to calculate. 
We can use the principal angle, or an upper bound for the 
principal angle such as (|3ll to bound the curvature, which in 
turn provides performance bounds for forward regression and 
orthogonal matching pursuit. 

Next we study the performance of forward regression and 
orthogonal matching pursuit with uniform and non-uniform 
matroid constraints. We will use f{E) to represent ||Pi 7 (£')|P 
occasionally in the following sections to simplify notation. 


HI. Results for uniform Matroid 

In this section, we will focus on the case where the matroid 
is uniform, i.e., I = [S C X : card(S') < K} for a 
given K. We consider two scenarios depending on the mutual 
orthogonality of elements in X. 


A. Orthogonal Scenario 

We call the set X mutually orthogonal if any two non¬ 
identical elements in X are orthogonal: {s\t) = 0 for any 
s f t G X. It is easy to show that forward regression and 
orthogonal matching pursuit are equivalent given that X is 
mutually orthogonal. It turns out that the optimality of these 
two algorithms is closely related with the mutual orthogonality 
of X. 
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Theorem 2: Suppose that X is mutually orthogonal. If 
{X, I) is a uniform matroid, then forward regression and 
orthogonal matching pursuit are optimal. 

Proof: Let E = {ei,...,eif} be a subset and rj be 
the vector of interest. By the Hilbert projection theorem and 
Pythagoras’ theorem, we have 

K 

\\Vr,{E)r = 

i=l 

It is easy to see that the optimal solution is to choose K 
largest projections among all vectors in X, which is the same 
as what the forward regression does. The insight of this result 
is closely related with principle component analysis. ■ 

Theorem |2] implies that to guarantee the optimality of for¬ 
ward regression and orthogonal matching pursuit, we should 
find an orthonormal basis for X. The Gram-Schmidt process 
can be used to generate an orthonormal basis using the ele¬ 
ments in X. However, this is, in general, intractable especially 
when card(X) is large. Moreover, the problem of optimally 
selecting K elements in X is different from the problem of 
optimally selecting K orthogonalized elements after applying 
the Gram-Schmidt process. 

Mutual orthogonality depends on the definition of inner 
product in the Hilbert space. For example, the Hilbert space 
defined on Gaussian measures has an orthonormal basis: Her- 
mite polynomials. Some other well-known examples include 
Charlier polynomials for Poisson measures, Laguerre polyno¬ 
mials for Gamma measures, Legendre and Fourier polynomials 
for uniform measures. 

The physical meaning of mutual orthogonality differs de¬ 
pending on the context of the problem. Take the quadratic 
covariance bound problem for example and consider the uni¬ 
form distribution parameterized by its mean 6: Uniform[— tt-I- 
6,tt + 0], The Cramer-Rao Bound is not applicable here 
because the derivative of the probability density function 
is not well-defined. On the other hand, the Fourier basis 
{cos(m(x — 9))}m^i is a well-defined orthonormal basis. 
These basis functions can be considered as energy eigenstates 
for a quantum particle in an infinite potential well. Another 
example is the Bhattacharya bound with the following Bhat- 
tacharya score functions: 

dlnd{x,9) d'^ \nd{x,9) \nd^d{x,9) 

df ’ dcf)^ ’ ’ dff 

where d{x, 9) denotes the probability density function for the 
measurement x. In general, these score functions are not or¬ 
thonormal. Moreover, the projection of the estimator error onto 
the first order partial derivative is not necessarily the largest, 
meaning that the Fisher score is not necessarily the optimal. 
However, in the Gaussian measure case, the Bhattacharya 
score functions turn out to be the Hermit polynomials and 
therefore are mutually orthogonal. For the LMMSE problem, 
mutually orthogonality means that all the sensor measurements 
are mutually imcorrelated. Therefore, if all the sensors gener¬ 
ate independent measurement signals, then forward regression 
and orthogonal matching pursuit are optimal in the uniform 
matroid case. For the sparse approximation problem, mutual 



orthogonality says that all the columns in the measurement 
matrix are mutually orthogonal, which cannot be true in the 
case of the under-determined system. 


B. Non-orthogonal Scenario 

When X is not mutually orthogonal, forward regression 
and orthogonal matching pursuit are in general not optimal. 
We give a counter example for forward regression; a sim¬ 
ilar counter example can be given for orthogonal matching 
pursuit. Let X = {si, 52 , 33 } where si = -^(lO) -f |1)), 
32 = ^(|1) + |2)), and S 3 = ^(|2) -f |3)). Suppose that 
T] = |0) -f 2 |1) -f 2 |2) -f |3), and the objective is to choose a 
subset E of X with card(iiJ) < 2 such that the projection of p 
onto spanjE') is maximized. Obviously, the optimal solution 
is to choose 3 i and S 3 and the maximum projection is 

r,({3l,S3})f = (p|3i)^ + (p|s3)^ = 9. 


Forward regression, however, is fooled into picking S 2 first 
because along S 2 it has the largest projection. After that, it 
chooses either si or S3. By the Gram-Schmidt process, the 
normalized orthogonal component of Si with respect to S 2 is 
given by 


_L _ S-(S1|S2)S2 

" ||S1-(S1|S2)S2|| 
Therefore, 




V2 


| 1 ) 
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| 2 )). 


r,({si,s2})f = r,({s^s2})f 

= {d\s2y + {r]\si)'^ ^ 

Apparently, forward regression is not optimal. Moreover, if 
X is not mutually orthogonal, then the two algorithms yield 
different results, which we discuss in separate subsections. 

1) Forward Regression: We first study forward regression 
when the matroid is uniform with the maximal cardinality of 
the sets in I equal to K. We use Gk to denote the solution 
using forward regression and OPT to denote the optimal 
solution. 

Theorem 3 (Uniform matroid): The forward regression al¬ 
gorithm achieves at least a (1 — (1 — ij^^j-approximation of 
the optimal solution: 

f{GK) > 1^1 - (^1 - ^ ./(OPT), (4) 

where K = niin(K, ky~^. 

The proof is given in Appendix |B] When min(K, k) < 1, 
the forward regression algorithm achieves at least a (1 — 1/e)- 
approximation of the optimal solution. 

2) Orthogonal Matching Pursuit: We first compare the 
step-wise gains in the objective function between orthogonal 
matching pursuit and forward regression. Recall that rj^ and fj 
represent the normalized orthogonal and parallel components 
of p with respect to span(i?): 

77 = rp*" sin p + fj cos (p, 


where p denotes the angle between p and span(iiJ). The 
orthogonal matching pursuit algorithm aims to find an element 
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t to maximize | \. The forward regression algorithm aims 

to find an element s to maximize |(?7|s''‘)|, where s-*- denotes 
the normalized orthogonal component of s with respect to 
span(iiJ). Suppose that the angle between s and span(i?) is 
(5(s). Note that S{s) is lower bounded by the principal angle 
(p by definition. By the fact that 

max 

s&X\E 

= max (p-'-js'*-sin(5(s))^ (5) 

s£X\E 

> sin^ 0 max 

s£X\E 

> sin^ 0 max 

s£X\E 

even though orthogonal matching pursuit is not the “greediest” 
algorithm, its step-wise gain is still within a certain range of 
that of forward regression, captured by the principal angle. 
With this observation, we can derive a performance bound 
for orthogonal matching pursuit. Again, we assume that the 
matroid is uniform with the maximal cardinality of the sets 
in I equal to K. We use Tk to denote the solution using 
orthogonal matching pursuit. 

Theorem 4 (Uniform matroid): The orthogonal matching 
pursuit algorithm achieves at least a (1 — (1 — 
approximation of the optimal solution: 

/{Tk) > (^1 - (^1 - ^ /(OPT), (6) 

where K = niin(K, 

The proof is given in Appendix ICl Notice that the difference 
between Theorem[3]and Theorem|4]is only the principal angle 
term sin^ p. It is easy to see that the lower bound in (|6]l is 
always lower than that in (|4]i, but this does not necessarily 
mean that /(T^) < /(G^). 


Gi and Ti be the forward regression and orthogonal matching 
pursuit solutions up to step i, respectively. Note that the 
cardinalities of Gi and Ti are i. 

Lemma 1: Any E C X with cardinality K can be ordered 
into {ei,..., Bk} such that for i = 1,..., A", we have 

/(G,_i U {ej) - /(G,_i) < /(GO - /(G,_i) 

and 

/(T,_i U {e.}) - /(T,_i) < /(T,_i U {p*}) - /(r,_i), 

where g* denotes the element added to Ti_i using the forward 
regression algorithm. 

Proof: We prove this lemma using induction in descend¬ 
ing order on the index i. First consider the sets E and Gk-i, 
and notice that |A| = AT > |Gir_i|. By the augmentation 
property of matroids, there exists an element in E, denoted 
by ck, such that Gk-i U {^k} € I. It is easy to see 
that /(Gk) - fiGK-i) > f{GK-i U {e^}) - /(G^_i). 
Suppose that/(Gfe)-/(Gfc_i) > /(Gfe_iU{efe})-/(Gfe_i) 
for all fc > i; we want to show that the inequality holds 
for the index i — 1. Consider Gi -2 and E \ {e^,..., }, 

where Cfc denotes the element in E such that the claim 
holds for k = i,... ,K. Again by the augmentation property 
of matroids, there exists an element in A \ {ci,..., }, 

denoted by e^-i, such that Gi -2 U {ci-i} € I. By the 
property of the forward regression algorithm, we know that 
/(G,_i) - /(G,_ 2 ) > /(G ,_2 U {e,_i}) - /(G,_ 2 ). This 
concludes the induction proof. 

The proof for the orthogonal matching pursuit follows a 
similar argument and it is omitted for the sake of brevity. ■ 

3) Forward Regression: In this section, we state the result 
for forward regression with the non-uniform matroid con¬ 
straint. We first state a lemma. 

Lemma 2: For i = 1, 2,..., AT, we have 


IV. Results for non-uniform matroid 


For non-uniform matroids, the two algorithms are not nec¬ 
essarily optimal even when X is mutually orthogonal. As 
a counter example, suppose that X = {|0), |1), |2), |3)} 
and / = {{|0)}, {|1)}, {|2)}, {|3)}, {|0), |1)}, {|2), |3)}}. It 
is easy to verify that (X, I) is a non-uniform matroid. Let 
T] = \/l -\- e |0) -f |2) -f |3) be the vector of interest, where 
e > 0. Forward regression ends up with {|0), |1)} while the 
optimal solution is {|2), |3)}. However, notice that 


11 ^( 110 ) 

r,({|2),i3)})ip 


(1-f e)/2 > 1/2. 


In this section, we will show that 1/2 is a general lower bound 
of these two algorithms for the non-uniform matroid case when 
the ground set is mutually orthogonal. This bound surprisingly 
matches the bound in 1^ . However, a significant distinction 
is that in our paper the submodularity of the objective function 
is no longer necessary (which is required by 1331). 

Next we derive performance bounds for forward regression 
and orthogonal matching pursuit in the situation where {X, I) 
is a non-uniform matroid. Before proceeding, we state a lemma 
that assists in handling the non-uniform matroid constraint. Let 


/(G,) - /(G,_i) < «(/(G,_i) - /(G,_2)). 


Proof: Let Gi = {gi,...,gi} where gj denotes the 
element added in the forward regression algorithm at step j. 
We know that Gi -2 U {gi} € I because of the hereditary 
property of the matroid. Moreover, the projection of rj gains 
more by adding gi-i than gi at stage i — 1 by the property 
of the forward regression algorithm. Then, by the definition 
of the backward elemental curvature, we obtain the desired 
result. ■ 

Next we present the performance bound for forward regres¬ 
sion in the non-uniform matroid scenario. 

Theorem 5 (Non-uniform matroid): The forward regression 
algorithm achieves at least a -approximation of the 

optimal solution: 


HGk) > 


1 

1 -f a{k, K)b(k) 


/(OPT), 


where a(k, k) = max(/t, k) if max(K, it) < 1 and a(/t, k) = 
max(/t, k)^ if max(At, k) > 1; h{k) = k^~^ if k > 1 and 
b(k) = 1 if K < 1. 

The proof is given in Appendix iDl 
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4} Orthogonal Matching Pursuit: Next we derive the bound 
for orthogonal matching pursuit for the case where {X, I) is 
a non-uniform matroid. To do so, we first define the OMP 
elemental curvature as follows: 

\\V,{EU{s,t})r-\\Vr,{EU{s})r 
^ ts^t \\r^(Eu{s}W-\\r,{EW ■ 

subject to E C X, card(iJ) < 2K — 2, s,t € X \ E, 
and |(77 -L(£;)|s)| > \{rj-^{E)\t)\. 


Again, we can provide an upper bound for OMP elemental 
curvature using principal angles. Note that |(p''‘|s)| > |(77''‘|f)| 
implies that 


\{v-^ 


< 


1 


sm( 


We can show that k is upper bounded as 


k < 


(sin (/) + 


Similar to the technique in Theorem [T] we can further bound 
the curvature using (I?]). Next we state our result in the non- 
uniform matroid case. 

Theorem 6 (Non-uniform matroid): The orthogonal 

matching pursuit achieves at least a 
approximation of the optimal solution: 

1 


l-t-a(K,K,77)&(/t)sin ^ (j) 


f{TK) > 


Ia{k, K,ri)b(^k)siii f 


—jiopn 


where a{k, k, k) = max(«:, k, k) if max(K, k,k) < 1 and 
a{k,k,k) = max{k,k,k)^ if ma,x(^k,k, k) > 1; b{k) = 
if > X and b{k) = 1 otherwise. 

The proof is given in Appendix Note that when X is 
mutually orthogonal, sinf = max(/t, k, k) = 1. An immediate 
result follows. 

Corollary 1: Suppose that X is mutually orthogonal. Then, 

1) Forward regression is equivalent to orthogonal matching 
pursuit; 

2) If / is a non-uniform matroid, then forward regression 
achieves at least a 1/2-approximation of the optimal 
solution. 

Recall that when X is mutually orthogonal, we have shown 
in Section II-A that these two algorithms are optimal when 
(X, I) is a uniform matroid. For a non-uniform matroid, they 
are not necessarily optimal. However, these two algorithms 
achieve at least 1/2-approximations of the optimal solution. 
Our results extend those in 1^ from a submodular function 
to a more general class of objective functions. 

Suppose that X is not mutually orthogonal but close in the 
sense that the principal angle f almost equal to 7r/2. We use 
5 = 'k/ 2 — (j> to denote the gap between f and 7r/2. Moreover, 
we assume that 6 is sufficiently small such that we only have 
to keep first order terms for Taylor expansions: 

1 


and 


. , ^ ~ 1 + 2(5, 
1 — 2cos(7r/2 — S) 

1 + {K - l)\k - 1\. 


\K-1 


Then, in the case of non-uniform matroid constraints, the lower 
bounds in Theorems|3and|6]for the aforementioned algorithms 
scale as ^ 

2 + 2{2K-l)5' 

which indicates that the lower bound scales inverse linearly 
with cardinality constraints K and the principal angle gap 
5 with 7r/2. Fortunately, K is mostly a small number (for 
example, the number of sparsity locations in compressive 
sensing problem). 


V. Conclusions 

In this paper, we have studied the subspace selection prob¬ 
lem for maximizing the projection of a vector of interest. 
We have introduced several new notions of elemental cur¬ 
vatures, upper bounded by functions of principal angle. We 
then derived explicit lower bounds for the performance of 
forward regression and orthogonal matching pursuit in the 
cases of uniform and non-uniform matroids. Moreover, we 
showed that if the elements in the ground sets are mutually 
orthogonal, then these algorithms are essentially optimal under 
the uniform matroid constraint and they achieve at least 1/2 
approximations of the optimal solution under the non-uniform 
matroid constraint. 


Appendix A 
Proof of Theorem[T] 

Proof: First consider a subset E of X, and two elements 
s and t in the set X \ E. we know that |(s|f)| < cosf 
by definition of the principal angle. We decompose the two 
elements into parallel and orthogonal components with respect 
to span(i?). Let us assume that fi and (j >2 are the angles 
between s, t and span(£i), respectively, then we have 

s = cos (/i s + sin 
t = cosfgi + 


We know that 

{s\t) = (cos^is -I- sinc/is'’"! cos^if -I- sin^if'*’) 
= cos^i cos(/ 2 (s|f) + sin sin 02 (s‘'‘|f‘'‘). 

Therefore, 


\{s^\n\ = 


(s|f) — cos 01 COS02(s|f) 


< 


sin 01 sin 02 
cos 0 -I- cos^ 0 


sin 


(7) 


For the numerator and denominator in the definitions of 
curvature, using Pythagoras’ theorem, it is easy to show that 

\\r,{E u {f})f - wv.mr = iip,({i^})f = 

and 


\\rr,{Eu{s,t})r-\\rr,{Eu{s})r = \\rr,{{i^})r = 


where f-*- denotes the orthonormal component of t with respect 
to span(ii5 U {s}). By the Gram-Schmidt process, we know 


that 


XU 

















Therefore, we obtain 


Therefore, by recursion, we have 






Hence, using (I?]) we can provide an upper bound of the forward 
elemental curvature using cj): 


1 — 1 — “ 1 —2cos(/)' 






)fiGK-l) 


K-1 


;,^/(OPT)X:(l- 


1 


Eti- 




2=0 


%2 —1 ' 


= /(OPT) 





Using a similar argument, we can provide an upper bound for 
the backward elemental curvature with the same form. The 
proof is complete. ■ 


Using a similar argument, we can show that 


/{Gk) > /(OPT) 



1 





Appendix B 
Proof of Theorem[3 


Combining these two inequalities, the proof is complete. 


Proof: For any M,N G I and \M\ < K and |iV| = K, 
let J = (MU N) \ M = {ji,... ,jr} where r < |A^|. We can 
permute the elements in J such that the elements are ordered 
to use the forward elemental curvature. More specifically, let 

ji= argmin ||T’^({j-^(M U {ji,..., ji_i})})||, 

where j-^{MU{ji ,..., ji_i}) denotes the normalized orthog¬ 
onal component of j with respect to span(MU{ji,..., ji_i}). 
Using the definition of forward elemental curvature, we have 

f{MUN)-f{M) 

r 

= ^(/(M U {ji,..., j,}) - f{M U {ji,..., j.-i})) 

2=1 

<^«*-i(/(MU{jJ)-/(M)) 

2=1 

Therefore, there exists j G X such that 
f{MUN)-f{M) 

|iV| 

<^/c-i(/(MU{J})-/(M)) 

2=1 

\N\ 

= J2k^-\fiMU{]})-fiM)). 

i=l 

We use Gk to denote the forward regression solution with 
cardinality k and OPT to denote the optimal solution. Using 
the properties of the forward regression algorithm and the 
monotone property, we have 


Appendix C 
Proof of Theorem|4] 

Proof: For any M,N G I and \M\ < K and |A^| = K, 
let J = {M U N) \ M = {ji,... ,jr} where r < |W|. We can 
permute the elements in J such that the elements are ordered 
to use the forward elemental curvature. More specifically, let 

ji= argmin ||T’^({j-^(M U {ji,..., j,_i})})||, 

where j'^(M U {ji,... ,ji-i}) denotes the normalized or¬ 
thogonal component of j with respect the span(M U 
{ji, ■ ■ ■: ji-i})- Using the definition of forward elemental 
curvature, we have 

f{MUN)-f{M) 

r 

= U {ji ,..., Ji}) - f{M U {ji,..., Ji-i})) 

2=1 

r 

<^«-i(/(MU{jJ)-/(M)) 

2=1 

Therefore, there exists j G X such that 
f{MUN)-f{M) 

INI 

<^N*-i(/(MU{j})-/(M)) 

2=1 

\N\ 

= ^N-i(/(MU{j})-/(M)). 

2=1 


/(G.)-/(G._i) 


> 


> 


(/(G,_i U OPT) - /(G,_i)) 


'"'-1 


(/(OPT)-/(G._i)). 


Using a similar argument, we can show that 
f{MUN)-f{M) 

|N| 

<^N*-i(/(MU{j})-/(M)). 


K 
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Using the properties of the forward regression algorithm, 
the monotone property, and Q, we have 

f{Ti) - 

>sin2</>(/(T,_iU{p*})-/(T,_i)) 

> UOPT) - /(r,_i)) 

>^(/(OPT)-/(T,_i)). 

K 


for i = 1,..., A". Hence, we obtain 

K 

E(/(GifU{dO)-/(G^)) 

K 

< ^max(/t, - /(Gi_i)) 

i=l 

max(«;, K)/(Gif), if max(/t, k) < 1 
max(K, K)'^/(Gif), if max(K, k) > 1 


Therefore, by recursion, we have 

/(G^) > ^/(OPT) + (^1 - f{GK-i) 

. 2 1 1 / • 2 \ ^ 

=/(OPT) (l - (l - ^) 


Therefore, we have 

/(OPT) < (1+ a(K,K)6(K))/(GK), 

where a(At, k) = max(K, k) if max(K, k) < 1 and a{k, k) = 
ma.x{k,R)^ if max(K, k) > 1; b{k) = k^~^ if k > 1 and 
b{k) = 1 otherwise. ■ 

Appendix E 
Proof of Theorem[6] 

■ Proof: Let OPT = {oi,..., o/c} be ordered such that the 

elemental forward curvature can be used. We know that 


Appendix D 
Proof of Thorem[5] 

Proof: We use a similar approach as that of the proof 
of Theorem [3 Let Gi = {(/i,... ,pi} where gj denotes the 
element added in the forward regression algorithm at stage j. 
Let OPT = {oi^... ,0 k} and assume that the elements are 
already reordered such that we can use the forward elemental 
curvature. We know that 


/(G^UOPT)-/(G;f) 

K 

<^«*-1(/(GkU{oJ)-/(Gk)) 

< [Eti(/(GxU{o,})-/(Grc)), if«<l 

“ \k^-^ T,f=AfiGK u {o,}) - /{Gk)), if k > 1. 


f(TK U OPT) - fiTK) 

< fEL(/(ricU{oJ)-/(rK)), if/t<l 
- EtiifiTK U {oj) - fiTK)), ifk>l. 

Using Lemma 1 and Q, we know that OPT can be ordered 
as {6i,..., di<-}, such that 

/(r,_i u {6J) - /(r,_i) 

< /(T,_1 U {g*}) - /m-i) 

^ /m) - /(T.-i) 

~ sin^ (j) ’ 

for i = 1,... ,K. Next we state a lemma and its proof that 
we will use. 

Lemma 3: Lor i = 1, 2,..., AT, we have 

f(Ti) - /(T,_i) < kifi%_{) - /(r,_2)). 


Prom Lemma [T] we know that OPT can be ordered into 
{di,..., Ok}, such that 


/(G,_1 U {dj) - /(G,_i) < fiGi) - /(G,_i), 


for i = 1,... ,K. Moreover, we know that Gi -2 U {gi} € / 
because of the hereditary property of the matroid. Moreover, 
we know that the projection of rj gains more by adding gi-i 
than gi at stage i — 1 by the property of the forward regression 
algorithm. Using Lemmas [T] and |2] and the definitions of 
forward and backward elemental curvatures, we obtain ®. 
Therefore, by recursion we have 

f{GKU{d,})- /{Gk) 

< max{k, (/(Gi) - /(Gi_i)), 


Proof of Lemma 3: Lor i = 1,..., AT, let Ti = {ti,... ,ti} 
where tj denotes the element added in the orthogonal matching 
pursuit algorithm at stage j. We know that Ti _2 U {fi} G I 
because of the hereditary property of the matroid. Therefore, 
we have |(? 7 -*-|ti_i)| > | |fi)| by the property of orthog¬ 

onal matching pursuit. Then, by the definition of the OMP 
elemental curvature, we obtain the inequality in the lemma. 

By the definitions of forward and backward elemental cur¬ 
vatures, we obtain (|9]l. Therefore, by Lemma 3 and recursion, 
we have 


/{Tk u {d,}) - /(Tk) 


< 


max(K, K, k) 


K-i+l 


sin^ (j) 


-(/(r,)-/(r,_i)), 


for i = 1,... ,K. 
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/(GkU{ 6 J)-/(Gx) < 


k{f{GK-iU{6,})-f{GK-i)), 

R{f{GK)-f{GK-i)), 


if \\rr,idi)\\>\\r,{gji)\\ 
if \\rr,id^)\\<\\rr,igji)\\. 


( 8 ) 


f{TK U {6,}) - /{Tk) < 


k{f{TK-iU{d,})-f{TK-i)), 

k{f{TK)-f{TK-l)), 


if \\rr,{di)\\>\\rr,{ij,)\\ 
if\\rr,(6i)\\<\\r,iiji)\\. 


(9) 


Therefore, we have 

K 

Y^ifiTK u {6.}) - UTk)) 

K 

= E 

2=1 


max(/^, /^, k) 




sin 


-(/(T,)-/(T,_i)) 


< 


sin ^ (j)ma.x{k, K,k)f{Tfc)-, if max(K, ac, k) < 1 
sin“^ (/)max(«;, K, K)^/(rR-), if maji{k,k,k) > 1. 


Therefore, we have 

/(OPT) < (1 + sin“^ (j)a{k, k, k)b{k)) f {Tk), 


where a{k, k, k) = max(«;, k, k) if max(K, k,k) < 1 and 
a{k,k,k) = max{k,k,k)^ if ma,x{k,k, k) > 1; b{k) = 
k^-i if k> 1 and b{k) = 1 otherwise. 
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